Marketing Channel ROI Optimization & Budget Allocation¶
Dataset: data/marketing_data.csv (Kaggle: Predict Conversion in Digital Marketing)
This notebook:
- Cleans and explores real user-level marketing data.
- Runs an A/B test for
CampaignType(Awareness vs Conversion) using a two-proportion Z-test. - Mitigates selection bias via Propensity Score Matching (PSM) and re-tests significance.
- Computes average ROI and CAC by channel (AOV assumed at $100).
- Reconstructs campaign-level data to fit Marginal ROI (M-ROI) quadratic models.
- Compares Average ROI vs M-ROI and gives budget recommendations.
Important: This is an observational dataset. Any causal interpretation requires controls (PSM) and assumptions.
# Step 1: Data Import & Cleaning
import pandas as pd
import numpy as np
import plotly.express as px
from statsmodels.stats.proportion import proportions_ztest
import statsmodels.api as sm
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import NearestNeighbors
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import pairwise_distances
import warnings
warnings.filterwarnings('ignore')
AOV = 100 # Assumed Average Order Value ($)
np.random.seed(42)
# Install dependencies as needed:
%pip install kagglehub[pandas-datasets]
import kagglehub
from kagglehub import KaggleDatasetAdapter
# Set the path to the file you'd like to load
file_path = "digital_marketing_campaign_dataset.csv"
# Load the latest version
df = kagglehub.load_dataset(
KaggleDatasetAdapter.PANDAS,
"rabieelkharoua/predict-conversion-in-digital-marketing-dataset",
file_path,
)
print("First 5 records:", df.head())
zsh:1: no matches found: kagglehub[pandas-datasets] Note: you may need to restart the kernel to use updated packages. First 5 records: CustomerID Age Gender Income CampaignChannel CampaignType AdSpend \ 0 8000 56 Female 136912 Social Media Awareness 6497.870068 1 8001 69 Male 41760 Email Retention 3898.668606 2 8002 46 Female 88456 PPC Awareness 1546.429596 3 8003 32 Female 44085 PPC Conversion 539.525936 4 8004 60 Female 83964 PPC Conversion 1678.043573 ClickThroughRate ConversionRate WebsiteVisits PagesPerVisit TimeOnSite \ 0 0.043919 0.088031 0 2.399017 7.396803 1 0.155725 0.182725 42 2.917138 5.352549 2 0.277490 0.076423 2 8.223619 13.794901 3 0.137611 0.088004 47 4.540939 14.688363 4 0.252851 0.109940 0 2.046847 13.993370 SocialShares EmailOpens EmailClicks PreviousPurchases LoyaltyPoints \ 0 19 6 9 4 688 1 5 2 7 2 3459 2 0 11 2 8 2337 3 89 2 2 0 2463 4 6 6 6 8 4345 AdvertisingPlatform AdvertisingTool Conversion 0 IsConfid ToolConfid 1 1 IsConfid ToolConfid 1 2 IsConfid ToolConfid 1 3 IsConfid ToolConfid 1 4 IsConfid ToolConfid 1
# Basic sanity checks
print('\nMissing values by column:')
print(df.isna().sum().sort_values(ascending=False).head(20))
print('\nDuplicate CustomerID count:', df.duplicated('CustomerID').sum())
print('Unique customers:', df['CustomerID'].nunique())
# Minimal cleaning: drop rows missing critical fields for core analyses
critical_cols = ['CustomerID','CampaignChannel','CampaignType','AdSpend','Conversion','Age','Income','WebsiteVisits']
df = df.dropna(subset=critical_cols).copy()
# Enforce types
df['Conversion'] = df['Conversion'].astype(int)
for col in ['AdSpend','Age','Income','WebsiteVisits']:
df[col] = pd.to_numeric(df[col], errors='coerce')
df = df.dropna(subset=['AdSpend','Age','Income','WebsiteVisits'])
df = df[df['AdSpend'] >= 0]
print('Post-clean shape:', df.shape)
Missing values by column: CustomerID 0 Age 0 AdvertisingTool 0 AdvertisingPlatform 0 LoyaltyPoints 0 PreviousPurchases 0 EmailClicks 0 EmailOpens 0 SocialShares 0 TimeOnSite 0 PagesPerVisit 0 WebsiteVisits 0 ConversionRate 0 ClickThroughRate 0 AdSpend 0 CampaignType 0 CampaignChannel 0 Income 0 Gender 0 Conversion 0 dtype: int64 Duplicate CustomerID count: 0 Unique customers: 8000 Post-clean shape: (8000, 20)
Step 2: Exploratory Data Analysis (EDA)¶
- Overall conversion rate and user profiles.
- Channel-level spend and conversions.
- Cross-profiles by channel (Age, Income).
overall_conv_rate = df['Conversion'].mean()
print(f'Overall conversion rate: {overall_conv_rate:.3f}')
# Age distribution
fig_age = px.histogram(df, x='Age', nbins=30, title='Age Distribution')
fig_age.show()
Overall conversion rate: 0.876
# Income distribution
fig_inc = px.histogram(df, x='Income', nbins=30, title='Income Distribution')
fig_inc.show()
# Channel-level AdSpend and Conversion
channel_agg = df.groupby('CampaignChannel').agg(
TotalAdSpend=('AdSpend','sum'),
Conversions=('Conversion','sum'),
Users=('CustomerID','nunique')
).reset_index()
channel_agg['ConvRate'] = channel_agg['Conversions'] / channel_agg['Users']
fig_spend = px.bar(channel_agg, x='CampaignChannel', y='TotalAdSpend', title='Total AdSpend by Channel')
fig_spend.show()
fig_conv = px.bar(channel_agg, x='CampaignChannel', y='Conversions', title='Total Conversions by Channel')
fig_conv.show()
# Cross-profile by channel
fig_age_ch = px.box(df, x='CampaignChannel', y='Age', title='Age by CampaignChannel')
fig_age_ch.show()
fig_inc_ch = px.box(df, x='CampaignChannel', y='Income', title='Income by CampaignChannel')
fig_inc_ch.show()
Step 3: A/B Testing (Quasi-Experimental)¶
Key Point: This is an observational dataset. Users may self-select or be targeted differently, causing selection bias.
We compare Conversion rates between Awareness (A) and Conversion (B). Then we apply Propensity Score Matching (PSM) using covariates (Age, Gender, Income, WebsiteVisits) and repeat the Z-test on the matched sample.
# Split groups
A_label, B_label = 'Awareness', 'Conversion'
df_A = df[df['CampaignType'] == A_label].copy()
df_B = df[df['CampaignType'] == B_label].copy()
print('A size:', len(df_A), 'B size:', len(df_B))
# Two-proportion Z-test on raw groups
conv_A_sum = df_A['Conversion'].sum()
conv_B_sum = df_B['Conversion'].sum()
n_A = len(df_A)
n_B = len(df_B)
z_raw, p_raw = proportions_ztest([conv_B_sum, conv_A_sum], [n_B, n_A], alternative='larger')
print(f'Raw Z={z_raw:.3f}, p-value={p_raw:.4f}')
print(f'ConvRate A={conv_A_sum/n_A:.3f}, B={conv_B_sum/n_B:.3f}')
A size: 1988 B size: 2077 Raw Z=8.117, p-value=0.0000 ConvRate A=0.856, B=0.934
Propensity Score Matching (PSM)¶
We estimate the probability of being in B (CampaignType == Conversion) with a logistic regression on covariates, then perform nearest-neighbor matching on the propensity score with a caliper.
# Prepare data for PSM
df_psm = df[df['CampaignType'].isin([A_label, B_label])].copy()
df_psm['T'] = (df_psm['CampaignType'] == B_label).astype(int)
# Encode Gender simple binary
df_psm['Gender_bin'] = df_psm['Gender'].map({'Male':1,'Female':0}).fillna(0)
covars = ['Age','Income','WebsiteVisits','Gender_bin']
X = df_psm[covars].copy()
y = df_psm['T']
preprocess = ColumnTransformer([
('num', Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())]), ['Age','Income','WebsiteVisits']),
('bin', 'passthrough', ['Gender_bin'])
])
logit = LogisticRegression(max_iter=1000)
psm_pipe = Pipeline([('prep', preprocess), ('logit', logit)])
psm_pipe.fit(X, y)
ps = psm_pipe.predict_proba(X)[:,1]
df_psm['pscore'] = ps
# Nearest-neighbor matching on pscore (1:1), simple caliper
treated = df_psm[df_psm['T']==1].copy()
control = df_psm[df_psm['T']==0].copy()
caliper = 0.05
nbrs = NearestNeighbors(n_neighbors=1).fit(control[['pscore']])
dist, idx = nbrs.kneighbors(treated[['pscore']])
match_pairs = []
used_controls = set()
for i, (d, j) in enumerate(zip(dist.flatten(), idx.flatten())):
if d <= caliper:
ctrl_index = control.iloc[j].name
if ctrl_index not in used_controls:
match_pairs.append((treated.iloc[i].name, ctrl_index))
used_controls.add(ctrl_index)
matched_t_idx = [t for t, _ in match_pairs]
matched_c_idx = [c for _, c in match_pairs]
df_matched = pd.concat([
df_psm.loc[matched_t_idx],
df_psm.loc[matched_c_idx]
])
print('Matched sample size:', df_matched.shape[0])
# Re-run Z-test after PSM
mA = df_matched[df_matched['T']==0]
mB = df_matched[df_matched['T']==1]
z_psm, p_psm = proportions_ztest([mB['Conversion'].sum(), mA['Conversion'].sum()], [len(mB), len(mA)], alternative='larger')
print(f'PSM Z={z_psm:.3f}, p-value={p_psm:.4f}')
print(f'PSM ConvRate A={mA["Conversion"].mean():.3f}, B={mB["Conversion"].mean():.3f}')
Matched sample size: 2240 PSM Z=7.129, p-value=0.0000 PSM ConvRate A=0.850, B=0.942
Step 4: ROI & CAC Analysis (Average Returns)¶
Assumption: Average Order Value (AOV) = $100.
Formulas:
TotalRevenue = TotalConversions * 100ROI = (TotalRevenue - TotalAdSpend) / TotalAdSpendCAC = TotalAdSpend / TotalConversions
roi_by_channel = df.groupby('CampaignChannel').agg(
TotalAdSpend=('AdSpend','sum'),
Conversions=('Conversion','sum'),
Users=('CustomerID','nunique')
).reset_index()
roi_by_channel['TotalRevenue'] = roi_by_channel['Conversions'] * AOV
roi_by_channel['ROI'] = (roi_by_channel['TotalRevenue'] - roi_by_channel['TotalAdSpend']) / roi_by_channel['TotalAdSpend']
roi_by_channel['CAC'] = roi_by_channel['TotalAdSpend'] / roi_by_channel['Conversions'].replace(0, np.nan)
roi_by_channel
| CampaignChannel | TotalAdSpend | Conversions | Users | TotalRevenue | ROI | CAC | |
|---|---|---|---|---|---|---|---|
| 0 | 7.871576e+06 | 1355 | 1557 | 135500 | -0.982786 | 5809.281071 | |
| 1 | PPC | 8.199237e+06 | 1461 | 1655 | 146100 | -0.982181 | 5612.071856 |
| 2 | Referral | 8.653519e+06 | 1518 | 1719 | 151800 | -0.982458 | 5700.605195 |
| 3 | SEO | 7.740904e+06 | 1359 | 1550 | 135900 | -0.982444 | 5696.029342 |
| 4 | Social Media | 7.542323e+06 | 1319 | 1519 | 131900 | -0.982512 | 5718.213229 |
fig_roi = px.bar(roi_by_channel, x='CampaignChannel', y='ROI', title='Average ROI by Channel')
fig_roi.show()
fig_cac = px.bar(roi_by_channel, x='CampaignChannel', y='CAC', title='CAC by Channel')
fig_cac.show()
Step 5: Marginal ROI (M-ROI) Modeling¶
Reconstruct campaign-level data using user-level rows. Note that AdSpend, ClickThroughRate, and ConversionRate appear repeated at the user-level—they are campaign attributes. We group by campaign keys to build campaign-level observations.
# Define campaign keys (proxy for unique campaigns)
campaign_keys = ['CampaignChannel','CampaignType','AdSpend']
df_campaign_agg = df.groupby(campaign_keys).agg(
TotalConversions=('Conversion','sum'),
TotalUsers=('CustomerID','nunique')
).reset_index()
df_campaign_agg['TotalRevenue'] = df_campaign_agg['TotalConversions'] * AOV
df_campaign_agg.head()
| CampaignChannel | CampaignType | AdSpend | TotalConversions | TotalUsers | TotalRevenue | |
|---|---|---|---|---|---|---|
| 0 | Awareness | 154.086910 | 1 | 1 | 100 | |
| 1 | Awareness | 158.583632 | 0 | 1 | 0 | |
| 2 | Awareness | 183.243466 | 1 | 1 | 100 | |
| 3 | Awareness | 230.987601 | 1 | 1 | 100 | |
| 4 | Awareness | 242.547848 | 1 | 1 | 100 |
Fit Quadratic Revenue ~ AdSpend per Channel¶
We expect a negative quadratic term (β₂ < 0) if a channel experiences diminishing returns at higher spend levels.
def fit_quadratic(sub):
X = pd.DataFrame({'const':1, 'AdSpend':sub['AdSpend'], 'AdSpend2':sub['AdSpend']**2})
y = sub['TotalRevenue']
model = sm.OLS(y, X).fit()
return model
models = {}
for ch in df_campaign_agg['CampaignChannel'].unique():
sub = df_campaign_agg[df_campaign_agg['CampaignChannel']==ch]
if sub['AdSpend'].nunique() >= 3: # need variance for quadratic fit
models[ch] = fit_quadratic(sub)
print(f'Channel={ch}\n', models[ch].summary().tables[1])
else:
print(f'Channel={ch}: not enough distinct AdSpend points for quadratic fit.')
Channel=Email
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 79.3828 2.776 28.593 0.000 73.937 84.828
AdSpend 0.0019 0.001 1.521 0.128 -0.001 0.004
AdSpend2 -5.692e-08 1.18e-07 -0.483 0.629 -2.88e-07 1.74e-07
==============================================================================
Channel=PPC
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 80.1566 2.392 33.514 0.000 75.465 84.848
AdSpend 0.0021 0.001 1.825 0.068 -0.000 0.004
AdSpend2 -6.214e-08 1.09e-07 -0.568 0.570 -2.77e-07 1.52e-07
==============================================================================
Channel=Referral
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 80.8806 2.435 33.219 0.000 76.105 85.656
AdSpend 0.0019 0.001 1.753 0.080 -0.000 0.004
AdSpend2 -6.944e-08 1.06e-07 -0.657 0.511 -2.77e-07 1.38e-07
==============================================================================
Channel=SEO
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 81.2586 2.532 32.091 0.000 76.292 86.225
AdSpend 0.0007 0.001 0.555 0.579 -0.002 0.003
AdSpend2 9.474e-08 1.15e-07 0.827 0.408 -1.3e-07 3.19e-07
==============================================================================
Channel=Social Media
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 78.5166 2.703 29.046 0.000 73.214 83.819
AdSpend 0.0017 0.001 1.336 0.182 -0.001 0.004
AdSpend2 1.948e-09 1.21e-07 0.016 0.987 -2.34e-07 2.38e-07
==============================================================================
df_roi = df_campaign_agg.copy()
df_roi = df_roi[df_roi["AdSpend"] > 0]
df_roi["ROI"] = (df_roi["TotalRevenue"] - df_roi["AdSpend"]) / df_roi["AdSpend"]
for ch, sub in df_roi.groupby("CampaignChannel"):
fig_roi_scatter = px.scatter(
sub,
x="AdSpend",
y="ROI",
trendline="lowess",
opacity=0.5,
color_discrete_sequence=["#1f77b4"],
title=f"{ch}: ROI vs AdSpend (LOWESS trend)",
)
fig_roi_scatter.update_traces(marker=dict(size=5))
ymin = min(-1.05, sub["ROI"].min() - 0.05)
ymax = max(0.10, sub["ROI"].max() + 0.05)
fig_roi_scatter.update_layout(
template="plotly_white",
yaxis=dict(title="ROI", zeroline=True, zerolinewidth=1, zerolinecolor="gray", range=[ymin, ymax]),
xaxis_title="AdSpend ($)",
title_font=dict(size=18, family="Arial"),
font=dict(size=12),
width=850,
height=400,
)
fig_roi_scatter.add_hline(y=0, line_dash="dot", line_color="green", annotation_text="Break-even (ROI=0)")
fig_roi_scatter.add_hline(y=-1, line_dash="dot", line_color="red", annotation_text="Loss limit (ROI=-1)")
fig_roi_scatter.show()
Compute Marginal ROI (M-ROI)¶
Given the quadratic model:
$$ \text{Revenue} = \beta_0 + \beta_1 \cdot S + \beta_2 \cdot S^2 $$
the marginal ROI at spend level (S) is:
$$ M\text{-}ROI(S) = \beta_1 + 2\beta_2 S. $$
We will compute M-ROI at the current average spend by channel, and compare to average ROI.
# Average ROI table (from Step 4) is roi_by_channel
avg_spend_by_channel = df_campaign_agg.groupby('CampaignChannel')['AdSpend'].mean().rename('AvgSpend').reset_index()
rows = []
for ch in roi_by_channel['CampaignChannel']:
avg_roi = roi_by_channel.loc[roi_by_channel['CampaignChannel']==ch, 'ROI'].values[0]
avg_spend = avg_spend_by_channel.loc[avg_spend_by_channel['CampaignChannel']==ch, 'AvgSpend'].values
avg_spend = float(avg_spend[0]) if len(avg_spend)>0 else np.nan
if ch in models:
b0 = models[ch].params['const']
b1 = models[ch].params['AdSpend']
b2 = models[ch].params['AdSpend2']
mroi = b1 + 2*b2*avg_spend
else:
mroi = np.nan
rows.append({'CampaignChannel': ch, 'AverageROI': avg_roi, 'AvgSpend': avg_spend, 'MROI_at_AvgSpend': mroi})
compare_df = pd.DataFrame(rows)
compare_df
| CampaignChannel | AverageROI | AvgSpend | MROI_at_AvgSpend | |
|---|---|---|---|---|
| 0 | -0.982786 | 5055.604272 | 0.001311 | |
| 1 | PPC | -0.982181 | 4954.221741 | 0.001435 |
| 2 | Referral | -0.982458 | 5034.042284 | 0.001237 |
| 3 | SEO | -0.982444 | 4994.131533 | 0.001602 |
| 4 | Social Media | -0.982512 | 4965.321428 | 0.001682 |
cd = compare_df.copy()
x_pad = 0.001
x_min, x_max = cd["AverageROI"].min() - x_pad, cd["AverageROI"].max() + x_pad
y_vals = cd["MROI_at_AvgSpend"].values
y_span = y_vals.max() - y_vals.min()
pad = max(0.0006, y_span * 1.5)
y_min, y_max = y_vals.min() - pad, y_vals.max() + pad
fig = px.scatter(
cd,
x="AverageROI", y="MROI_at_AvgSpend",
color="CampaignChannel", symbol="CampaignChannel",
size="AvgSpend", size_max=22,
text="CampaignChannel",
hover_data={"AverageROI":":.6f", "MROI_at_AvgSpend":":.6f", "AvgSpend":":.0f"},
title="Average ROI vs M-ROI (zoomed)"
)
fig.update_traces(textposition="top center")
fig.update_xaxes(title="Average ROI", range=[x_min, x_max], tickformat=".5f", zeroline=True)
fig.update_yaxes(title="M-ROI @ avg spend", range=[y_min, y_max], tickformat=".4f", zeroline=True)
fig.add_hline(y=1, line_dash="dot", line_color="gray")
fig.add_vline(x=1, line_dash="dot", line_color="gray")
fig.update_layout(template="plotly_white", height=520)
fig.show()
Step 6: Business Recommendations & Budget Optimization¶
Goal. Turn statistical findings into an actionable budget plan using both Average ROI and Marginal ROI (M-ROI).
6.1 Experiment decision (A/B test → rollout?)¶
- We tested
CampaignType = ConversionvsAwarenessusing a two-proportion Z-test. - Because this is observational data, we repeated the test after Propensity Score Matching (PSM) on
(Age, Gender, Income, WebsiteVisits).
Decision rule.
- If
p_raw < 0.05andp_psm < 0.05(uplift stable post-PSM): Roll out Conversion more broadly. - If only raw is significant but PSM is not: Maintain A/B holdout and collect more data (selection bias likely).
- If neither is significant: Do not roll out.
# --- 6.1 Print A/B decision facts (assumes earlier variables exist) ---
try:
print("A/B test (raw): Z = %.3f, p = %.4g" % (z_raw, p_raw))
print("A/B test (PSM): Z = %.3f, p = %.4g" % (z_psm, p_psm))
print("Raw rates: A = %.3f, B = %.3f" % (conv_A_sum/len(df_A), conv_B_sum/len(df_B)))
print("PSM rates: A = %.3f, B = %.3f" % (mA['Conversion'].mean(), mB['Conversion'].mean()))
except Exception as e:
print("A/B numbers not in scope. (This does not affect the budget section.)")
A/B test (raw): Z = 8.117, p = 2.394e-16 A/B test (PSM): Z = 7.129, p = 5.067e-13 Raw rates: A = 0.856, B = 0.934 PSM rates: A = 0.850, B = 0.942
6.2 Read the ROI vs M-ROI matrix¶
- Average ROI answers: “On average, did the channel pay back?”
- M-ROI (slope of
Revenue(Spend)) answers: “If I add one more $, how many $ of revenue do I get right now?”- M-ROI > 1 ⇒ incremental dollars are profitable (revenue > spend).
- M-ROI < 1 ⇒ over-investment at the margin (each extra $ loses money).
# --- 6.2 ROI vs M-ROI matrix (compare_df expected from Step 5) ---
if 'compare_df' in globals():
display(compare_df[['CampaignChannel','AverageROI','AvgSpend','MROI_at_AvgSpend']])
fig = px.scatter(
compare_df,
x="AverageROI", y="MROI_at_AvgSpend",
color="CampaignChannel", size="AvgSpend", size_max=22,
title="Average ROI vs M-ROI (zoomed)"
)
# zoom so points don’t overlap
x_pad = 0.001
xs = compare_df["AverageROI"].values
fig.update_xaxes(range=[xs.min()-x_pad, xs.max()+x_pad], tickformat=".5f", title="Average ROI")
ys = compare_df["MROI_at_AvgSpend"].values
y_span = ys.max() - ys.min()
pad = max(0.0006, y_span * 1.5)
fig.update_yaxes(range=[ys.min()-pad, ys.max()+pad], tickformat=".4f", title="M-ROI @ avg spend")
fig.update_layout(template="plotly_white", height=520)
fig.add_hline(y=1, line_dash="dot", line_color="gray")
fig.add_vline(x=1, line_dash="dot", line_color="gray")
fig.show()
else:
print("compare_df not found; please run Step 5 before Step 6.")
| CampaignChannel | AverageROI | AvgSpend | MROI_at_AvgSpend | |
|---|---|---|---|---|
| 0 | -0.982786 | 5055.604272 | 0.001311 | |
| 1 | PPC | -0.982181 | 4954.221741 | 0.001435 |
| 2 | Referral | -0.982458 | 5034.042284 | 0.001237 |
| 3 | SEO | -0.982444 | 4994.131533 | 0.001602 |
| 4 | Social Media | -0.982512 | 4965.321428 | 0.001682 |
6.3 Budget policy and why we changed the method¶
Important modeling note: In Step 5 we fit quadratics at the campaign level (TotalRevenue vs AdSpend per campaign). Using those same coefficients at channel-total scale (millions of dollars) violates scale consistency for nonlinear models, i.e. f(sum S_i) ≠ sum f(S_i). This can produce nonsense negative “lifts” when simulating large reallocations.
Default method (robust): Local, first-order reallocation¶
We optimize small rotations of budget using current M-ROIs only (no large extrapolation):
\begin{equation} \Delta \text{Revenue} \approx \big(\text{M-ROI}_\text{to} - \text{M-ROI}_\text{from}\big)\cdot \Delta \end{equation}subject to:
- Constant total budget (rotation cap e.g. 5–20%),
- Learning floors (keep ≥5–10% of each channel’s current spend),
- Steps in small increments (e.g., 1% of total) to stay local.
Outputs in this notebook
plan_local— list of concrete transfers (from → to, amount, estimated first-order gain).alloc_df— new per-channel spends (constant total) and deltas vs current.- A first-order estimated lift for the chosen rotation cap.
cur_totals = df_campaign_agg.groupby('CampaignChannel')['AdSpend'].sum().rename('CurrentSpend')
mroi = compare_df.set_index('CampaignChannel')['MROI_at_AvgSpend'].copy()
# Config: how much budget are we allowed to shuffle (e.g., 10% of total)?
TOTAL = cur_totals.sum()
shift_cap = 0.10 * TOTAL # allow at most 10% rotation
step = 0.01 * TOTAL # move in 1% steps to keep it local
min_floor = 0.05 # keep at least 5% of each channel's current spend
alloc = cur_totals.copy().astype(float)
moved = 0.0
records = []
while moved + 1e-9 < shift_cap:
donor = mroi.idxmin()
recv = mroi.idxmax()
if mroi[recv] <= mroi[donor] + 1e-12: # no gain
break
max_out = alloc[donor] * (1 - min_floor)
if max_out <= 0:
# can't take from this donor; set its MROI high so we skip it
mroi[donor] = np.inf
continue
delta = min(step, shift_cap - moved, max_out)
if delta <= 0: break
alloc[donor] -= delta
alloc[recv] += delta
# first-order revenue gain
gain = (mroi[recv] - mroi[donor]) * delta
moved += delta
records.append({
'from': donor, 'to': recv, 'delta': delta, 'estimated_gain': gain,
'MROI_from': mroi[donor], 'MROI_to': mroi[recv]
})
plan_local = pd.DataFrame(records)
net_moves = (plan_local.groupby(['from','to'], as_index=False)
.agg(total_delta=('delta','sum'),
est_gain=('estimated_gain','sum'),
steps=('delta','count')))
net_moves = net_moves.sort_values('total_delta', ascending=False)
print("Net transfer summary (aggregated):")
display(net_moves)
print("\nNew allocated spends (keeping total constant):")
alloc_df = alloc.rename('ReallocatedSpend').reset_index().rename(columns={'index':'CampaignChannel'})
alloc_df = alloc_df.merge(cur_totals.rename('CurrentSpend'), on='CampaignChannel', how='left')
alloc_df['Delta_vs_Current'] = alloc_df['ReallocatedSpend'] - alloc_df['CurrentSpend']
display(alloc_df)
if len(plan_local):
print(f"\nFirst-order estimated revenue lift for {shift_cap/TOTAL:.0%} rotation: "
f"{plan_local['estimated_gain'].sum():,.0f}")
else:
print("\nNo profitable first-order move given current M-ROIs.")
Net transfer summary (aggregated):
| from | to | total_delta | est_gain | steps | |
|---|---|---|---|---|---|
| 0 | Referral | Social Media | 4.000756e+06 | 1780.185773 | 10 |
New allocated spends (keeping total constant):
| CampaignChannel | ReallocatedSpend | CurrentSpend | Delta_vs_Current | |
|---|---|---|---|---|
| 0 | 7.871576e+06 | 7.871576e+06 | 0.000000e+00 | |
| 1 | PPC | 8.199237e+06 | 8.199237e+06 | 0.000000e+00 |
| 2 | Referral | 4.652763e+06 | 8.653519e+06 | -4.000756e+06 |
| 3 | SEO | 7.740904e+06 | 7.740904e+06 | 0.000000e+00 |
| 4 | Social Media | 1.154308e+07 | 7.542323e+06 | 4.000756e+06 |
First-order estimated revenue lift for 10% rotation: 1,780
6.4 Guardrails & Handoff¶
We validate the plan with a minimum-spend floor per channel (learning/coverage), visualize budget deltas, and export the plan for sign-off. Sensitivity and CUPED are omitted for clarity.
assert 'alloc_df' in globals(), "Run Step 6.3 first to generate alloc_df."
# 1) Guardrail check: min floor per channel
MIN_FLOOR_RATIO = 0.05 # keep ≥5% of current spend
guard = alloc_df.copy()
guard["MinFloor"] = guard["CurrentSpend"] * MIN_FLOOR_RATIO
guard["FloorViolated"] = guard["ReallocatedSpend"] < guard["MinFloor"]
print("Guardrail check (Min spend floor):")
display(guard[["CampaignChannel","CurrentSpend","ReallocatedSpend","MinFloor","FloorViolated"]])
# 2) Budget deltas: clean bar chart
fig_delta = px.bar(
alloc_df.sort_values('Delta_vs_Current'),
x='CampaignChannel', y='Delta_vs_Current',
title='Budget Changes by Channel (Δ vs Current)',
labels={'Delta_vs_Current':'Δ Spend'},
)
fig_delta.update_layout(template='plotly_white', height=420)
fig_delta.add_hline(y=0, line_dash='dot', line_color='gray')
fig_delta.show()
# 3) Export handoff tables (for sign-off)
os.makedirs("outputs/tables", exist_ok=True)
alloc_df.to_csv("outputs/tables/budget_plan.csv", index=False)
if 'compare_df' in globals():
compare_df.to_csv("outputs/tables/roi_mroi_matrix.csv", index=False)
print("Saved: outputs/tables/budget_plan.csv")
if 'compare_df' in globals():
print("Saved: outputs/tables/roi_mroi_matrix.csv")
Guardrail check (Min spend floor):
| CampaignChannel | CurrentSpend | ReallocatedSpend | MinFloor | FloorViolated | |
|---|---|---|---|---|---|
| 0 | 7.871576e+06 | 7.871576e+06 | 393578.792594 | False | |
| 1 | PPC | 8.199237e+06 | 8.199237e+06 | 409961.849056 | False |
| 2 | Referral | 8.653519e+06 | 4.652763e+06 | 432675.934280 | False |
| 3 | SEO | 7.740904e+06 | 7.740904e+06 | 387045.193783 | False |
| 4 | Social Media | 7.542323e+06 | 1.154308e+07 | 377116.162458 | False |
Saved: outputs/tables/budget_plan.csv Saved: outputs/tables/roi_mroi_matrix.csv